Background

The FIRST Robotics Competition (FRC) is a competition where teams build a robot for 6ish weeks, then compete in one or more events to try to secure a spot in the championship.

Competition Structure

An event consists of two parts, qualification and playoffs. During the qualification matches, you play matches with algorithmically assigned teams against another alliance. An alliance is a group of three teams that are working together to win against the other alliance. There is a blue alliance and a red alliance. During the playoff matches, the alliances are picked by the top 8 teams. During most events, you pick two other teams to be part of your alliance through the rest of the playoffs. If one of your robots breaks, you can request a backup robot to be added to your alliance to replace the broken robot. At the championship, you pick three other teams, which means your backup is built-in to your alliance, because you only ever compete with three robots.

A regional is a competition of ~50-60 teams. The winning alliance gets to go to the championship. If a team in the winning alliance already qualified for the championship, then a team from the second-place alliance gets to go. A district is a geographical area where teams attend two ~30-40 team events instead, followed by a district championship. The top teams in the district championship get to go to the championship.

Ever since 2005, there have been two alliances of 3 teams each. In 1999-2000 and 2002-2004 there were 2 team alliances, 2001 was 4v0, and 1992-1998 did not have alliances. We only have match data from 2002 to the present.

Each year, teams are challenged by a new competition. Everything from making robots play basketball to attacking a castle have been themes for competitions. One year, teams may be asked to collect and launch balls into a target (2002, 2004, 2006, 2008, 2009, 2010, 2012, 2014, 2016, 2017, 2019) and the next year be asked to pick up inflatable shapes and place them onto pegs (2007, 2011). There have also been years with stacking bins (2003, 2015), a year with tetras (basically a hollow tetrahedron that can be stacked) (2005), a year of ultimate frisbee (2013), and a year with milk crates (2018).

Team Locations

The FIRST Robotics Competition (FRC) started in 1992 in a high school gym in Manchester, NH. In 1992, there were 28 teams at 1 competition. In 2019, there were over 3800 active teams competing at over 160 events worldwide.

Teams are located all over the world, with 31 countries represented in 2019.

teams %>%
  filter(2019 %in% years) %>%
  group_by(country) %>%
  summarize(count = n()) %>%
  arrange(desc(count)) %>%
  knitr::kable(col.names = c("Country", "Number of Teams"),
               align = "ll")
Country Number of Teams
USA 3121
Canada 270
Turkey 80
Mexico 76
China 67
Israel 66
Australia 45
Chinese Taipei 21
Brazil 14
Dominican Republic 4
Netherlands 4
Chile 3
India 3
Japan 3
Switzerland 3
United Kingdom 3
Colombia 2
France 2
New Zealand 2
Poland 2
Afghanistan 1
Czech Republic 1
Germany 1
Indonesia 1
Libya 1
Norway 1
South Africa 1
Sweden 1
Ukraine 1
Venezuela 1
Vietnam 1

The vast majority of the teams are located in the US. Every state has at least 1 team.

teams %>%
  filter(2019 %in% years & country == "USA") %>%
  mutate(state_prov = case_when(
    state_prov == "CT" ~ "Connecticut",
    state_prov == "MI" ~ "Michigan",
    state_prov == "TX" ~ "Texas",
    TRUE ~ state_prov
  )) %>%
  group_by(state_prov) %>%
  summarize(count = n()) %>%
  arrange(desc(count)) %>%
  knitr::kable(col.names = c("State/Province", "Number of Teams"),
               align = "ll")
State/Province Number of Teams
Michigan 540
California 310
Minnesota 221
Texas 176
New York 174
Washington 98
Florida 89
Georgia 85
New Jersey 83
Virginia 75
Missouri 74
Massachusetts 73
North Carolina 70
Pennsylvania 68
Ohio 65
Illinois 63
Wisconsin 59
Indiana 57
Arizona 56
Connecticut 55
Oregon 52
Oklahoma 47
New Hampshire 45
South Carolina 42
Maryland 41
Colorado 40
Arkansas 39
Louisiana 39
Tennessee 33
Hawaii 28
Iowa 26
Utah 25
Kansas 23
Nevada 22
Maine 21
Idaho 15
Alabama 14
Kentucky 13
Mississippi 12
District of Columbia 10
New Mexico 10
North Dakota 5
Rhode Island 5
Vermont 5
West Virginia 4
Wyoming 4
Nebraska 3
Delaware 2
Montana 2
Alaska 1
Puerto Rico 1
South Dakota 1

This is the distribution of US teams (note that 28 Hawaiian teams, 1 Alaskan team and 1 Puerto Rican team are omitted)

zip.codes <- read.csv("zipcode.csv")
teams %>%
  filter(2019 %in% years & country == "USA") %>%
  mutate(postal_code = parse_number(postal_code)) %>%
  left_join(zip.codes, by = c("postal_code" = "ZIP")) %>%
  filter(!is.na(LAT) & !is.na(LNG)) %>%
  filter(LNG > -140 & LAT > 20) %>%
  group_by(postal_code) %>%
  summarize(count = n(), LAT = LAT, LNG = LNG) %>%
  ggplot(aes(LNG, LAT, alpha = count)) +
  geom_point() +
  borders("state") +
  coord_quickmap()

Performance Within a Season

To evaluate performance within a season, we can compare each week of competition. Because the season is multiple weeks long, events happening on the same weekend are classified as being part of that “week” of competition.

In some years, performance does seem to improve, but in others it does not. For example, in 2017 the performance increased slightly as the weeks of the season progressed:

scores %>%
  filter(year == 2017 & week %in% 0:6) %>%
  ggplot() +
  geom_violin(aes(week, score)) +
  labs(title = "Score Distribution by Week - 2017")

But in the 2009 season, there isn’t any obvious improvement:

scores %>%
  filter(year == 2009 & week %in% 0:5) %>%
  ggplot() +
  geom_violin(aes(week, score)) +
  labs(title = "Score Distribution by Week - 2009")

If we include the championship events, we consistently see an improvement over the weeks prior. Using 2009 again, we see that District Championships (dcmp), Championship Divisions (cmpd) and Championship Finals (cmpf) have a very noticeable improvement in scores.

scores %>%
  filter(year == 2009) %>%
  ggplot() +
  geom_violin(aes(week, score)) +
  labs(title = "Score Distribution by Week - 2009")

In many years, the difference between the Championship Finals and other levels can be drastic. Take the 2016 season as an example. The weeks progressed with minor improvements, the District Championships and Championship Divisions have an improvement in scores, but the Championship Finals have scores much higher.

scores %>%
  filter(year == 2016) %>%
  ggplot() +
  geom_violin(aes(week, score)) +
  labs(title = "Score Distribution by Week - 2016")

Using a boxplot, we can see that the average score in a Championship Finals match is almost double that of the average score in the Championship Divisions.

scores %>%
  filter(year == 2016) %>%
  ggplot() +
  geom_boxplot(aes(week, score)) +
  labs(title = "Score Distribution by Week - 2016")

Comparing Performance Between Multiple Seasons

Because the competition changes every year, comparing raw scores from one year to the next is not a good measure of performance. For example, the 2010 competition had an average match score of 4.07, while the 2018 competition had an average match score of 291.90.

scores %>%
  group_by(year) %>%
  summarize(avg_score = mean(score)) %>%
  knitr::kable(col.names = c("Year", "Average Score"),
               align = "ll")
Year Average Score
2002 29.921029
2003 48.985660
2004 59.851016
2005 28.246132
2006 35.213372
2007 30.064372
2008 42.879916
2009 56.692035
2010 4.079307
2011 38.120657
2012 25.047814
2013 64.813903
2014 100.211908
2015 75.583587
2016 85.289039
2017 233.390947
2018 291.902244
2019 54.930552

We found that a good way to compare across years is to compute a team’s performance relative to the rest of the teams in the competition that year. By calculating the average score achieved by each team, we can determine what percentile each team achieved in a given year.

team_avg_by_year <- matches_by_team %>%
  group_by(team, team_num, year) %>%
  summarize(avg_score = mean(score))

team_percentile_by_year <- data.frame(
  team = character(),
  team_num = integer(),
  year = integer(),
  avg_score = double(),
  percentile = double()
)

for (yr in years) {
  x <- team_avg_by_year %>%
    filter(year == yr)
  x$percentile <- ecdf(x$avg_score)(x$avg_score) * 100
  
  team_percentile_by_year <- bind_rows(team_percentile_by_year, x) %>%
    arrange(team_num)
}

team_percentile_avg <- team_percentile_by_year %>%
  group_by(team, team_num) %>%
  summarize(avg_performance = mean(percentile)) %>%
  ungroup()

For example, the performance of Team 2855 (Max’s former team) is shown below:

team_percentile_by_year %>%
  filter(team_num == 2855) %>%
  ggplot(aes(as.factor(year), percentile, group = 1)) +
  geom_line() +
  ylim(0, 100) +
  labs(title = "Performance of Team 2855",
       x = "Year",
       y = "Percentile")

As shown here, Team 2855 has had a couple of good years, but overall has been an ok team at best.

We can compare this to Team 3691, based out of Northfield High School:

team_percentile_by_year %>%
  filter(team_num == 3691) %>%
  ggplot(aes(as.factor(year), percentile, group = 1)) +
  geom_line() +
  ylim(0, 100) +
  labs(title = "Performance of Team 3691",
       x = "Year",
       y = "Percentile")

At first glance, it seems like Team 3691 has been a better-performing team than Team 2855. We can average all of a team’s percentiles to determine an average performance.

team_percentile_avg %>%
  filter(team_num %in% c(2855, 3691)) %>%
  left_join(teams, by = c("team_num" = "team_number")) %>%
  arrange(team_num) %>%
  select(team_num, nickname, rookie_year, years_competed, avg_performance) %>%
  head(20) %>%
  knitr::kable(col.names = c("Team #", "Nickname", "Rookie Year", 
                             "Years Competed", "Average Performance"),
               align = "lllll")
Team # Nickname Rookie Year Years Competed Average Performance
2855 BEASTBOT 2009 11 26.67849
3691 RoboRaiders 2011 10 47.94250

This confirms that Team 3691 is a better-performing team than Team 2855.

Top Performing Teams of All Time

We can use our average performance metric to determine the top teams of all time:

team_percentile_avg %>%
  left_join(teams, by = c("team_num" = "team_number")) %>%
  arrange(desc(avg_performance)) %>%
  select(team_num, nickname, rookie_year, years_competed, state_prov, avg_performance) %>%
  head(20) %>%
  knitr::kable(col.names = c("Team #", "Nickname", "Rookie Year", 
                             "Years Competed", "State/Province", "Average Performance"),
               align = "llllll")
Team # Nickname Rookie Year Years Competed State/Province Average Performance
2056 OP Robotics 2007 14 Ontario 99.90801
2970 eSchool eBots 2009 1 WI 98.86567
5406 Celt-X 2015 6 Ontario 98.64966
2098 Bulldogs 2007 1 GA 98.45560
7457 suPURDUEper Robotics 2019 2 Indiana 97.89894
254 The Cheesy Poofs 1999 22 California 97.10775
1114 Simbotics 2003 18 Ontario 96.99516
3683 Team DAVE 2011 10 Ontario 96.98346
67 The HOT Team 1997 24 Michigan 96.91468
4414 HighTide 2012 2 California 96.80851
2753 Team Overdrive 2009 2 NJ 96.71322
7553 OSTC - SWEET BOTS 2019 2 Michigan 96.67553
5184 TITANICS 2014 1 Alberta 96.55172
4678 CyberCavs 2013 8 Ontario 96.19824
782 Kilowatts 2002 3 CT 96.15072
5172 Gators 2014 7 Minnesota 95.95098
71 Team Hammond 1996 25 Indiana 95.85849
7021 TC Robotics 2018 3 Wisconsin 95.73704
4917 Sir Lancerbot 2014 7 Ontario 95.12320
27 Team RUSH 1997 24 Michigan 95.11682

A team that stands out here is Team 2056 (aptly named “OP Robotics”), who has been able to perform at an impressive level in all 14 years they’ve competed, performing in the 99.91th percentile on average. (Notice that the graph only shows the 95th to 100th percentiles, and yet they’re still at the top.)

team_percentile_by_year %>%
  filter(team_num == 2056) %>%
  ggplot(aes(as.factor(year), percentile, group = 1)) +
  geom_point() +
  geom_line() +
  ylim(95, 100) +
  labs(title = "Performance of Team 2056",
       x = "Year",
       y = "Percentile")

Other teams that stand out are teams 2970 and 2098, which only competed for one season but were one of the best teams in that season.

Performance of Longest Competing Teams

We can use our average performance metric to compute the performance of some of the oldest teams. We have two possible ways to determine a team’s “age”:

  • The number of seasons they have competed in
  • The number of matches they have competed in

Number of Seasons Competed vs Average Performance

With the FIRST Robotics Competition starting back in 1992, there have been 29 seasons (including 2020). Only three teams exist that have competed in all of those seasons, though 9 teams from 1992 are still active in 2020 (they took a year or more off at some point).

teams %>%
  filter(rookie_year == 1992 & 2020 %in% years) %>%
  arrange(team_number) %>%
  select(team_number, nickname, state_prov, rookie_year, years_competed) %>%
  knitr::kable(col.names = c("Team #", "Nickname", "State/Province",
                             "Rookie Year", "Seasons Competed"),
               align = "lllll")
Team # Nickname State/Province Rookie Year Seasons Competed
20 The Rocketeers New York 1992 24
45 TechnoKats Robotics Team Indiana 1992 29
126 Gael Force Massachusetts 1992 29
131 C.H.A.O.S. New Hampshire 1992 27
148 Robowranglers Texas 1992 28
151 Tough Techs New Hampshire 1992 27
157 AZTECHS Massachusetts 1992 27
190 Gompei and the H.E.R.D. Massachusetts 1992 27
191 X-CATS New York 1992 29

These teams have been around a long time, but that isn’t the norm. That is a sample of only 9 out of about 8000 teams that have competed over the 29 seasons. The average number of seasons competed in is around 6 years, with a median of 4.

teams %>%
  ggplot() +
  geom_violin(aes(years_competed, 0)) +
  labs(title = "Lifespan of Teams",
       x = "Seasons Competed",
       y = "")

Using the number of seasons competed, we can take the top 31 teams based (31 because that is the cutoff between 24 and 25 seasons competed).

team_percentile_avg %>%
  left_join(teams, by = c("team_num" = "team_number")) %>%
  arrange(desc(years_competed), team_num) %>%
  select(team_num, nickname, state_prov, rookie_year, years_competed, avg_performance) %>%
  head(31) %>%
  mutate(nickname = ifelse(team_num == 173, "RAGE Robotics", nickname)) %>%
  knitr::kable(col.names = c("Team #", "Nickname", "State/Province", 
                             "Rookie Year", "Seasons Competed", 
                             "Average Performance"),
               align = "llllll")
Team # Nickname State/Province Rookie Year Seasons Competed Average Performance
45 TechnoKats Robotics Team Indiana 1992 29 82.05897
126 Gael Force Massachusetts 1992 29 90.95443
191 X-CATS New York 1992 29 84.97691
148 Robowranglers Texas 1992 28 89.25748
81 MetalHeads Illinois 1994 27 39.20181
131 C.H.A.O.S. New Hampshire 1992 27 67.16112
151 Tough Techs New Hampshire 1992 27 54.35748
155 The Technonuts Connecticut 1994 27 66.55131
157 AZTECHS Massachusetts 1992 27 64.12055
190 Gompei and the H.E.R.D. Massachusetts 1992 27 78.16108
74 Team CHAOS Michigan 1995 26 80.50757
108 Robotics Team Florida 1995 26 68.52366
111 WildStang Illinois 1996 26 93.69707
141 WOBOT Michigan 1995 26 80.07415
166 Chop Shop New Hampshire 1995 26 69.57478
173 RAGE Robotics Connecticut 1995 26 80.15928
177 Bobcat Robotics Connecticut 1995 26 87.18504
8 Paly Robotics California 1996 25 56.74377
28 Pierson Whalers New York 1996 25 66.48026
33 Killer Bees Michigan 1996 25 93.39666
58 The Riot Crew Maine 1996 25 85.25065
69 HYPER Massachusetts 1998 25 82.95772
71 Team Hammond Indiana 1996 25 95.85849
85 B.O.B. (Built on Brains) Michigan 1996 25 85.06791
88 TJ² Massachusetts 1996 25 77.73869
116 Epsilon Delta Virginia 1996 25 58.06107
120 Cleveland’s Team Ohio 1995 25 61.69404
121 Rhode Warriors Rhode Island 1996 25 76.64927
171 Cheese Curd Herd Wisconsin 1995 25 71.05626
175 Buzz Robotics Connecticut 1996 25 92.02480
176 Aces High Connecticut 1996 25 84.64481
team_percentile_avg %>%
  left_join(teams, by = c("team_num" = "team_number")) %>%
  filter(years_competed >= 25) %>%
  ggplot() +
  geom_point(aes(as.factor(years_competed), avg_performance)) +
  labs(title = "Performance of Longest Competing Teams by Season",
       x = "Seasons Competed",
       y = "Average Performance")

As we can see from this graph, more seasons competed doesn’t always equate to higher performance. But, if we zoom out a bit and include all teams, we see that there is a distinct improvement over time.

team_percentile_avg %>%
  left_join(teams, by = c("team_num" = "team_number")) %>%
  ggplot(aes(years_competed, avg_performance)) +
  geom_jitter(height = 0) +
  geom_smooth() +
  ylim(0, 100) +
  labs(title = "Seasons Competed vs Average Performance",
       x = "Seasons Competed",
       y = "Average Performance")

We can average all of the points for each number of seasons competed. The improvement isn’t uniform and doesn’t always improve, especially above 15 seasons competed when the sample size is very small.

team_percentile_avg %>%
  left_join(teams, by = c("team_num" = "team_number")) %>%
  filter(!is.na(years_competed)) %>%
  group_by(years_competed) %>%
  summarize(avg_avg_performance = mean(avg_performance), num_teams = n()) %>%
  ggplot() +
  geom_point(aes(as.factor(years_competed), avg_avg_performance, size = num_teams)) +
  geom_line(aes(as.factor(years_competed), avg_avg_performance, group = 1)) +
  ylim(0, 100) +
  labs(title = "Years Competed vs Average of Average Performance",
       x = "Years Competed",
       y = "Average of Average Performance",
       size = "Number of Teams")

Number of Matches Played vs Average Performance

This chart shows how many matches each team has played across the years 2002 to 2019. As expected, teams with lower numbers have played more matches (because they’ve been around longer).

One team to notice is team 9999. This team does not actually exist. The number 9999 is used as a placeholder for a team that has not received a number yet. This usually only happens during preseason and offseason events, though it seems to have happened in a week 0 regional event in 2004.

matches_played_by_team <- matches_by_team %>%
  group_by(team_num) %>%
  summarize(played = n()) %>%
  ungroup()

matches_played_by_team %>%
  ggplot(aes(team_num, played)) +
  geom_point() +
  geom_smooth() +
  labs(title = "Matches Played by Team Number",
       x = "Team Number",
       y = "Matches Played")

We can look at the number of seasons competed vs number of matches played. Because teams can attend multiple events, plus some teams attend the championship, the number of matches played can vary greatly.

matches_played_by_team %>%
  left_join(teams, by = c("team_num" = "team_number")) %>%
  ggplot(aes(years_competed, played)) +
  geom_jitter(height = 0, width = 0.3) +
  geom_smooth() +
  labs(title = "Seasons Competed vs Matches Played",
       x = "Seasons Competed",
       y = "Matches Played")

We can then compare the number of matches played by a team with their average performance.

team_percentile_avg %>%
  left_join(matches_played_by_team, by = "team_num") %>%
  ggplot(aes(played, avg_performance)) +
  geom_point() +
  geom_smooth() +
  ylim(0, 100) +
  labs(title = "Matches Played vs Average Performance",
       x = "Matches Played",
       y = "Average Performance")

This time we see a very distinct improvement with more experience. This could be because of a few factors:

  • More seasons mean better performance as well as more matches played
  • Teams that attend the championship are usually there because they performed well in the regular season, meaning better performing teams get to play more matches

Events Attended in a Season and Performance

Our question here is to figure out how attending multiple events can impact your team’s performance and development. First, we will divide teams based on how many events they attended in a given season and see how the score percentile compares between number of events teams attended.

team_events_per_year <- matches_by_team %>%
  group_by(team, team_num, year, event_key, event_type) %>%
  summarize() %>%
  ungroup() %>%
  group_by(team, team_num, year) %>%
  summarize(events = n(),
            non_cmp_events = length(event_key[event_type %in% c("regional", 
                                                                "district")]),
            attended_dcmp = "district_championship" %in% event_type | 
              "district_championship_division" %in% event_type,
            attended_cmp = "championship_division" %in% event_type,
            in_district = "district" %in% event_type)
team_events_per_year %>%
  left_join(team_percentile_by_year, by = c("team", "team_num", "year")) %>%
  filter(year == 2019) %>%
  ggplot() +
  geom_boxplot(aes(x = as.factor(events), y = percentile)) +
  labs(x = "Number of Events Attended",
       y = "Score Percentile for Year",
       title = as.character(yr))

These results aren’t too shocking, but they give us some useful information to keep in mind later. Clearly, the average score percentile for a team that attends a lot of events will be higher than a team that attends few since higher scoring teams are able to compete in more events.

It is interesting to see how high a team’s score percentile needs to be in order for them to expect to compete in many events. For example, in 2019 there were only a few teams that were below the 80th percentile that competed in 5 events.

These boxplots don’t reveal too many interesting trends, but they give us an understanding of how the average number of events a team participates is based on how far they make it in a season. When we investigate the 2019 scores of the championship division and the championship finals, we can use the median number of events attended 4 and 5, respectively to compare the team scores between teams that attended a lot of events and few events.

team_avg_by_week_2019 %>%
  filter(week == "cmpd") %>%
  ggplot() +
  geom_boxplot(aes(x = (events >= 4), y = avg.score)) +
  labs(x = "Team Attended 4 or More Events in 2019",
       y = "Average Score at Championship Division")

This is the breakdown of teams in the championship division based on the 4 events attended cutoff we found previously. This plot is showing that the teams that attended 4 or more events in 2019 score better in this event than those who didn’t. We can’t use this as evidence that a team that has more experience will score better though since a winning team in a championship division will attend the championship finals, which adds to their event count. A more interesting plot is this same boxplot for the championship finals below.

team_avg_by_week_2019 %>%
  filter(week == "cmpf") %>%
  ggplot() +
  geom_boxplot(aes(x = (events >= 5), y = avg.score)) +
  labs(x = "Team Attended 5 or More Events in 2019",
       y = "Average Score at Championship Finals")

This plot tells the opposite story from before. The teams that made it to the finals who competed in more events actually performed worse in this case. This shows experience in a single season isn’t the most reliable indicator of season performance.

team_avg_by_week_2019 %>%
  filter(events >= 6) %>%
  ggplot() +
  geom_line(aes(x = week, y = avg.score, group = team, color = team)) +
  geom_line(data = week_avg_score_2019, mapping = aes(x = week, y = avg.score, group = year), size = 2) +
  labs(x = "Week of 2019 Season",
       y = "Team Average Score")

Here is a breakdown of score progression throughout the 2019 season for teams that competed in 6 or more events. These are the teams that competed in the most events of all the teams. All of the teams have positive trends that indicate the teams are still progressing through each of the many events they attend in the 2019 season.

The black line is the team average score each week for all the teams, not just the one’s who attended a lot of events. The scores of the more experienced teams are usually higher than this average for each week. The black line has a few sudden jumps, which can be explained by the jump from regular season to postseason, and the jump to the finals. The jumps are caused by the competition level increases that eliminate lower scoring teams.

Conclusion

In this report, we set out to analyze how the performance of a team is related to how experienced they are. The first step we took was to calculate score percentiles to compare different competition years since scores vary drastically based on the theme of the competition. Once we had percentiles, we took a look at the year-to-year performance of individual teams such as OP Robotics and Max’s former team. Percentiles were also a valuable tool to compare teams based on experience. We defined experience in two ways: number of seasons and number of matches. In both cases, the more experienced teams had much better scores with an average percentile rank of about 40 more than newer teams. Finally, we took a look at how teams progress throughout a season. The teams that played more matches in a season did score higher than the average. What’s more impressive is that these teams continued to increase their score as the season went on at a rate faster than average.

It is tough to quantify the effect of experience on performance. The experienced teams both in terms of seasons completed and matches played are certainly much better on average and score higher. We also saw that the teams that progress the most through the course of the season are the ones that attend the most events. However, we can’t say this is all due to experience since a team must be successful to become experienced. That means playing more matches by making it to championship events in a season or completing more seasons by lasting many years because the team has performed well. All in all, a more experienced team performs better in competition.